151 research outputs found

    NFU-Enabled FASTA: moving bioinformatics applications onto wide area networks

    Get PDF
    Abstract Background Advances in Internet technologies have allowed life science researchers to reach beyond the lab-centric research paradigm to create distributed collaborations. Of the existing technologies that support distributed collaborations, there are currently none that simultaneously support data storage and computation as a shared network resource, enabling computational burden to be wholly removed from participating clients. Software using computation-enable logistical networking components of the Internet Backplane Protocol provides a suitable means to accomplish these tasks. Here, we demonstrate software that enables this approach by distributing both the FASTA algorithm and appropriate data sets within the framework of a wide area network. Results For large datasets, computation-enabled logistical networks provide a significant reduction in FASTA algorithm running time over local and non-distributed logistical networking frameworks. We also find that genome-scale sizes of the stored data are easily adaptable to logistical networks. Conclusion Network function unit-enabled Internet Backplane Protocol effectively distributes FASTA algorithm computation over large data sets stored within the scaleable network. In situations where computation is subject to parallel solution over very large data sets, this approach provides a means to allow distributed collaborators access to a shared storage resource capable of storing the large volumes of data equated with modern life science. In addition, it provides a computation framework that removes the burden of computation from the client and places it within the network

    MuTrack: a genome analysis system for large-scale mutagenesis in the mouse

    Get PDF
    BACKGROUND: Modern biological research makes possible the comprehensive study and development of heritable mutations in the mouse model at high-throughput. Using techniques spanning genetics, molecular biology, histology, and behavioral science, researchers may examine, with varying degrees of granularity, numerous phenotypic aspects of mutant mouse strains directly pertinent to human disease states. Success of these and other genome-wide endeavors relies on a well-structured bioinformatics core that brings together investigators from widely dispersed institutions and enables them to seamlessly integrate data, observations and discussions. DESCRIPTION: MuTrack was developed as the bioinformatics core for a large mouse phenotype screening effort. It is a comprehensive collection of on-line computational tools and tracks thousands of mutagenized mice from birth through senescence and death. It identifies the physical location of mice during an intensive phenotype screening process at several locations throughout the state of Tennessee and collects raw and processed experimental data from each domain. MuTrack's statistical package allows researchers to access a real-time analysis of mouse pedigrees for aberrant behavior, and subsequent recirculation and retesting. The end result is the classification of potential and actual heritable mutant mouse strains that become immediately available to outside researchers who have expressed interest in the mutant phenotype. CONCLUSION: MuTrack demonstrates the effectiveness of using bioinformatics techniques in data collection, integration and analysis to identify unique result sets that are beyond the capacity of a solitary laboratory. By employing the research expertise of investigators at several institutions for a broad-ranging study, the TMGC has amplified the effectiveness of any one consortium member. The bioinformatics strategy presented here lends future collaborative efforts a template for a comprehensive approach to large-scale analysis

    Integration of heterogeneous functional genomics data in gerontology research to find genes and pathway underlying aging across species.

    Get PDF
    Understanding the biological mechanisms behind aging, lifespan and healthspan is becoming increasingly important as the proportion of the world\u27s population over the age of 65 grows, along with the cost and complexity of their care. BigData oriented approaches and analysis methods enable current and future bio-gerontologists to synthesize, distill and interpret vast, heterogeneous data from functional genomics studies of aging. GeneWeaver is an analysis system for integration of data that allows investigators to store, search, and analyze immense amounts of data including user-submitted experimental data, data from primary publications, and data in other databases. Aging related genome-wide gene sets from primary publications were curated into this system in concert with data from other model-organism and aging-specific databases, and applied to several questions in genrontology using. For example, we identified Cd63 as a frequently represented gene among aging-related genome-wide results. To evaluate the role of Cd63 in aging, we performed RNAi knockdown of the C. elegans ortholog, tsp-7, demonstrating that this manipulation is capable of extending lifespan. The tools in GeneWeaver enable aging researchers to make new discoveries into the associations between the genes, normal biological processes, and diseases that affect aging, healthspan, and lifespan

    Curating gene sets: challenges and opportunities for integrative analysis.

    Get PDF
    Genomic data interpretation often requires analyses that move from a gene-by-gene focus to a focus on sets of genes that are associated with biological phenomena such as molecular processes, phenotypes, diseases, drug interactions or environmental conditions. Unique challenges exist in the curation of gene sets beyond the challenges in curation of individual genes. Here we highlight a literature curation workflow whereby gene sets are curated from peer-reviewed published data into GeneWeaver (GW), a data repository and analysis platform. We describe the system features that allow for a flexible yet precise curation procedure. We illustrate the value of curation by gene sets through analysis of independently curated sets that relate to the integrated stress response, showing that sets curated from independent sources all share significant Jaccard similarity. A suite of reproducible analysis tools is provided in GW as services to carry out interactive functional investigation of user-submitted gene sets within the context of over 150 000 gene sets constructed from publicly available resources and published gene lists. A curation interface supports the ability of users to design and maintain curation workflows of gene sets, including assigning, reviewing and releasing gene sets within a curation project context

    Interpretation of psychiatric genome-wide association studies with multispecies heterogeneous functional genomic data integration.

    Get PDF
    Genome-wide association studies and other discovery genetics methods provide a means to identify previously unknown biological mechanisms underlying behavioral disorders that may point to new therapeutic avenues, augment diagnostic tools, and yield a deeper understanding of the biology of psychiatric conditions. Recent advances in psychiatric genetics have been made possible through large-scale collaborative efforts. These studies have begun to unearth many novel genetic variants associated with psychiatric disorders and behavioral traits in human populations. Significant challenges remain in characterizing the resulting disease-associated genetic variants and prioritizing functional follow-up to make them useful for mechanistic understanding and development of therapeutics. Model organism research has generated extensive genomic data that can provide insight into the neurobiological mechanisms of variant action, but a cohesive effort must be made to establish which aspects of the biological modulation of behavioral traits are evolutionarily conserved across species. Scalable computing, new data integration strategies, and advanced analysis methods outlined in this review provide a framework to efficiently harness model organism data in support of clinically relevant psychiatric phenotypes

    Cisplatin-resistant triple-negative breast cancer subtypes: multiple mechanisms of resistance.

    Get PDF
    BACKGROUND: Understanding mechanisms underlying specific chemotherapeutic responses in subtypes of cancer may improve identification of treatment strategies most likely to benefit particular patients. For example, triple-negative breast cancer (TNBC) patients have variable response to the chemotherapeutic agent cisplatin. Understanding the basis of treatment response in cancer subtypes will lead to more informed decisions about selection of treatment strategies. METHODS: In this study we used an integrative functional genomics approach to investigate the molecular mechanisms underlying known cisplatin-response differences among subtypes of TNBC. To identify changes in gene expression that could explain mechanisms of resistance, we examined 102 evolutionarily conserved cisplatin-associated genes, evaluating their differential expression in the cisplatin-sensitive, basal-like 1 (BL1) and basal-like 2 (BL2) subtypes, and the two cisplatin-resistant, luminal androgen receptor (LAR) and mesenchymal (M) subtypes of TNBC. RESULTS: We found 20 genes that were differentially expressed in at least one subtype. Fifteen of the 20 genes are associated with cell death and are distributed among all TNBC subtypes. The less cisplatin-responsive LAR and M TNBC subtypes show different regulation of 13 genes compared to the more sensitive BL1 and BL2 subtypes. These 13 genes identify a variety of cisplatin-resistance mechanisms including increased transport and detoxification of cisplatin, and mis-regulation of the epithelial to mesenchymal transition. CONCLUSIONS: We identified gene signatures in resistant TNBC subtypes indicative of mechanisms of cisplatin. Our results indicate that response to cisplatin in TNBC has a complex foundation based on impact of treatment on distinct cellular pathways. We find that examination of expression data in the context of heterogeneous data such as drug-gene interactions leads to a better understanding of mechanisms at work in cancer therapy response

    On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types

    Full text link
    BACKGROUND: Integrating and analyzing heterogeneous genome-scale data is a huge algorithmic challenge for modern systems biology. Bipartite graphs can be useful for representing relationships across pairs of disparate data types, with the interpretation of these relationships accomplished through an enumeration of maximal bicliques. Most previously-known techniques are generally ill-suited to this foundational task, because they are relatively inefficient and without effective scaling. In this paper, a powerful new algorithm is described that produces all maximal bicliques in a bipartite graph. Unlike most previous approaches, the new method neither places undue restrictions on its input nor inflates the problem size. Efficiency is achieved through an innovative exploitation of bipartite graph structure, and through computational reductions that rapidly eliminate non-maximal candidates from the search space. An iterative selection of vertices for consideration based on non-decreasing common neighborhood sizes boosts efficiency and leads to more balanced recursion trees. RESULTS: The new technique is implemented and compared to previously published approaches from graph theory and data mining. Formal time and space bounds are derived. Experiments are performed on both random graphs and graphs constructed from functional genomics data. It is shown that the new method substantially outperforms the best previous alternatives. CONCLUSIONS: The new method is streamlined, efficient, and particularly well-suited to the study of huge and diverse biological data. A robust implementation has been incorporated into GeneWeaver, an online tool for integrating and analyzing functional genomics experiments, available at http://geneweaver.org. The enormous increase in scalability it provides empowers users to study complex and previously unassailable gene-set associations between genes and their biological functions in a hierarchical fashion and on a genome-wide scale. This practical computational resource is adaptable to almost any applications environment in which bipartite graphs can be used to model relationships between pairs of heterogeneous entities. BMC Bioinformatics 2014; 15(1):110

    GeneWeaver: a web-based system for integrative functional genomics

    Get PDF
    High-throughput genome technologies have produced a wealth of data on the association of genes and gene products to biological functions. Investigators have discovered value in combining their experimental results with published genome-wide association studies, quantitative trait locus, microarray, RNA-sequencing and mutant phenotyping studies to identify gene-function associations across diverse experiments, species, conditions, behaviors or biological processes. These experimental results are typically derived from disparate data repositories, publication supplements or reconstructions from primary data stores. This leaves bench biologists with the complex and unscalable task of integrating data by identifying and gathering relevant studies, reanalyzing primary data, unifying gene identifiers and applying ad hoc computational analysis to the integrated set. The freely available GeneWeaver (http://www.GeneWeaver.org) powered by the Ontological Discovery Environment is a curated repository of genomic experimental results with an accompanying tool set for dynamic integration of these data sets, enabling users to interactively address questions about sets of biological functions and their relations to sets of genes. Thus, large numbers of independently published genomic results can be organized into new conceptual frameworks driven by the underlying, inferred biological relationships rather than a pre-existing semantic framework. An empirical ‘ontology’ is discovered from the aggregate of experimental knowledge around user-defined areas of biological inquiry

    Earth BioGenome Project: Sequencing life for the future of life.

    Get PDF
    Increasing our understanding of Earth's biodiversity and responsibly stewarding its resources are among the most crucial scientific and social challenges of the new millennium. These challenges require fundamental new knowledge of the organization, evolution, functions, and interactions among millions of the planet's organisms. Herein, we present a perspective on the Earth BioGenome Project (EBP), a moonshot for biology that aims to sequence, catalog, and characterize the genomes of all of Earth's eukaryotic biodiversity over a period of 10 years. The outcomes of the EBP will inform a broad range of major issues facing humanity, such as the impact of climate change on biodiversity, the conservation of endangered species and ecosystems, and the preservation and enhancement of ecosystem services. We describe hurdles that the project faces, including data-sharing policies that ensure a permanent, freely available resource for future scientific discovery while respecting access and benefit sharing guidelines of the Nagoya Protocol. We also describe scientific and organizational challenges in executing such an ambitious project, and the structure proposed to achieve the project's goals. The far-reaching potential benefits of creating an open digital repository of genomic information for life on Earth can be realized only by a coordinated international effort
    corecore